Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems

نویسندگان

Yong Yan

Canming Jin

Xiaodong Zhang

چکیده

Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. We experimentally compared the performance of our algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs we used for performance evaluation were carefully selected for different classes of parallel loops. Our results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by collecting runtime information is insignificant in comparison with the performance improvement. Our experiments show that the adaptive algorithm and its five variations outperformed the existing scheduling algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Loop Scheduling Algorithms on DistributedMemory Systems

Loops are the largest source of parallelism in many applications. All prior DOALL loop scheduling algorithms such as Self-Scheduling, Guided Self-Scheduling, Trapezoid Self-Scheduling, and Factoring try to achieve workload balance through decreasing chunk sizes. Moreover, they have been analyzed only for shared memory platforms. In this work, the prior loop scheduling methods will be evaluated ...

متن کامل

Load Balancing for Parallel Loops in Workstation Clusters

Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck even fo...

متن کامل

Simple Code Generation for special UDLs

This paper focuses on transforming sequential perfectly nested loops into their equivalent parallel form. A special category of FOR nested loops is the uniform dependence loops (UDLs), which yield efficient parallelization techniques. An automatic code generation tool for shared and distributed memory machines, has been developed in order to automatically parallelize these perfectly nested loop...

متن کامل

Scheduling User-Level Threads on Distributed Shared-Memory Multiprocessors

In this paper we present Dynamic Bisectioning or DBS, a simple but powerful comprehensive scheduling policy for user-level threads, which unifies the exploitation of (multidimensional) loop and nested functional (or task) parallelism. Unlike other schemes that have been proposed and used thus far, DBS is not constrained to scheduling DAGs or singly nested parallel loops. Rather, our policy enco...

متن کامل

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches. The simulations indicate that to maximize memory performance it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch singlewo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Parallel Distrib. Syst.

دوره 8 شماره

صفحات -

تاریخ انتشار 1997

Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems

نویسندگان

چکیده

منابع مشابه

Evaluation of Loop Scheduling Algorithms on DistributedMemory Systems

Load Balancing for Parallel Loops in Workstation Clusters

Simple Code Generation for special UDLs

Scheduling User-Level Threads on Distributed Shared-Memory Multiprocessors

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

عنوان ژورنال:

اشتراک گذاری